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A hybrid prognosis model is being developed for real-time residual useful life estimation of metallic aircraft 
structural components. The prognosis framework combines information from off-line physics-based, off-line data 
driven and on-line system identification based predictive models. The present paper focuses on the later two 
components of an integrated, hybrid prognosis model. These components are explicitly based on Gaussian process 
based data driven approach within a Bayesian framework. Fatigue crack behavior of Aluminum 2024 compact- 
tension (CT) specimens under variable loading has been modeled using this multivariate Gaussian process 
technique. The Gaussian process model projects the input space to an output space by probabilistically inferring the 
underlying non-linear function relating input and output. For the off-line prediction the input space of the model is 
trained with parameters that affect fatigue crack growth, such as number of fatigue cycles, minimum load, maximum 
load, and load ratio. For the case of online prediction, the model input space is trained using features found from 
piezoelectric sensor signals rather than training the input space with loading parameters, which are difficult to 
measure in a real flight-worthy structure. In both the off-line and on-line case the output space is trained with known 
associated crack lengths. Once the Gaussian process model is trained, a new output space for which the 
corresponding crack length or damage state is not known is predicted using the trained Gaussian process model. 
Concepts are validated through several numerical examples. 
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I. Introduction 


A 

.Zlircraft maintenance must balance labor, logistic, and equipment budget constraints with the competing 
requirements of fleet readiness, reliability, and safety. Recently 1 , stringent Diagnostic, Prognostic, and Health 
Management (PHM) capability requirements are being placed on new applications, like the Joint Strike Fighter 
(JSF), in order to enable and reap the benefits of new and revolutionary Logistic Support concepts. Though the 
Prognostics and Health Management (PHM) 2 is the name given to the capability being developed by JSF to enable 
the vision of Autonomic Logistics and so meet the overall affordability and supportability goals, similar PHM 
systems can be developed and implemented for residual life prognostics and health management of any civil, 
mechanical or aerospace structure. Currently aggressive research is going on for development of an integrated PHM 
system however these research activities are generally in an incipient stage to address the stringent requirement of a 
truly effective PHM system. Current aerospace practice follows a damage-tolerant reliability engineering model 
whereby structural components are regularly inspected and replaced. The replaced components are not necessarily at 
the end of its designed strength. These practices unnecessarily add to the overhauling expenditure and time. The 
damage-tolerant reliability engineering designs are generally based on a physics-based fracture mechanics approach 
or a data driven stochastic approach. The physics-based damage tolerance approach that is widely practiced 3-6 and 
constantly been improved is primarily based on linear fracture mechanics (LFEM), as far as fatigue failure is 
concerned. Another form of damage tolerance approach, known as a life usage model 7 ’ 8 , is also widely used and is 
based on gathering statistical information about how long a component endures before failure, and uses these 
statistics, collected from a large population sample, to make remaining life predictions for individual components. 
However, these predictions are not based on measured characteristics of the individual components. In addition, 
fatigue life of aircraft structural components under service loading is often analyzed and predicted based on crack 
growth rates obtained from constant-amplitude fatigue testing data 9 ’ 10 . In contrast to the fatigue crack growth due to 
constant amplitude loading, crack growth caused by variable amplitude loadings is characterized by retardation and 
acceleration effects n , which extend or reduce the lifetime of structures. Currently there are many physics-based 
models 3 " 6,12,13 with empirical parameters available to model crack growth with retardation and acceleration effects. 
These models reasonably capture the dynamics of the fatigue crack growth under variable loading in a deterministic 
framework. However, these models do not explicitly model the uncertainty in crack growth that arises due to scatter 
in micro-structural properties and subsequent uncertainty propagation due to loading sequence effects. 

As of today many research works varying from medical application 14 to aerospace application 15 show the 
effective use of Neural network for diagnostic and prognostic systems. However few of these Neural network 
models are based on the explicit uncertainty quantification approach like the Bayesian uncertainty modeling 
approach. It is noted that Bayesian methods allow 16 complex neural network models to be used without fear of the 
“over fitting” that can occur with traditional neural network learning methods. However the Bayesian analysis of 
neural networks is difficult 17 because a simple prior over weighing parameters of the network requires a complex 
prior over underlying functions and, hence, it becomes computationally intractable. The present paper discusses the 
use of the Gaussian process approach 17,18 for a prognostic system that explicitly models Bayesian uncertainty into 
the predictive model. A Gaussian process (GP) model is a simplification of Bayesian analysis of neural networks by 
assuming that the multivariate random variables are gaussian random variables in a infinite (countable or 
continuous) index set. The Gaussian process model projects the input space to an output space by probabilistically 
inferring the underlying non-linear function relating input and output. 

In the present paper both offline and online predictions of fatigue crack growth in Compact Tension (CT) samples 
are made using a multi variate Gaussian process model. For off-line prediction the input space of the model is 
trained with parameters that affect fatigue crack growth such as, number of fatigue cycles, minimum load, maximum 
load, and load ratio. In turn, the output space is linked to the corresponding crack lengths or crack growth rates. The 
Gaussian process models the scatter in fatigue crack growth that arises due to microstructural variability, loading 
uncertainty and variability due to manufacturing tolerance. Once the Gaussian process is trained with a known input- 
output data set it can predict the output crack length or its rate under the particular loading envelope. For the case of 
online prediction, the model input space is trained using features found from piezoelectric sensor signals rather than 
training the input space with loading parameters, whereas, the output space is trained with corresponding crack 
lengths. 
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II. Technical Approach 

The following section outlines a general approach for a hybrid prognosis model with a detail description on off¬ 
line data driven predictive model and system identification based on-line predictive model. 
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A. General Overview on Hybrid Prognosis Model 

A hybrid prognosis model based on both physics and data driven based techniques is being developed. A 
schematic of the hybrid prognosis model is shown in Fig. 1. The overall prognosis model will have three different 
modules: off-line physics based model, off-line data driven probabilistic model and system identification based an 
on-line predictive model. All the three sub modules will be finally integrated together to develop a hybrid prognosis 
model. The off-line physics-based predictive model will be based on a non-linear fracture mechanics based 
approach, whereas the off-line data 
driven model and on-line system 
identification model will be based on a 
probabilistic Gaussian process approach. 

The off-line data driven model will 
explicitly model the macro level 
uncertainty that arises due to 
microstructure variability, loading 
uncertainty, etc., and adaptively 
complement the off-line physics based 
model for any unmodeled exogenous 
influences. The physics-based model 
combined with data driven probabilistic 
model will be used for off-line prediction 
of residual useful life of a structural 
components under an anticipated flight 
envelope, whereas the system identification based online predictive model will estimate the current damage state in 
real time and make this information available to off-line module to reassess the residual useful life of the component 
based on those real time information. The present paper discuses only on the predictive capability of data driven 
based off-line predictive model and system identification based on-line predictive model. 
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Figure 1. A conceptual hybrid prognostic model (Source of the 
fighter plane image in figure is from: http://www.jsf.mil). 


B. Gaussian Process Based Off-line Date Driven Predictive Model 


The goal of Gaussian process data driven prediction model is to compute the distribution of damage state for 
future instances for which the damage affecting physical parameters are known. For example, the loading patterns, 
number of flight cycles elapsed, initial damage size, environmental condition, grain size distributions are the typical 
damage affecting physical parameters. The Gaussian process assumes the scatter in crack growth (as schematically 

shown in Fig. 2) is due to 
variation in these parameters. Due 
to the scatter in crack length or the 
damage state, at any instances the 
damage state follows a 
distribution rather than being 
deterministic. Gaussian process 
assumes this individual 
distribution as Gaussian 
distribution with different mean 
and variance and evaluates the 
conditional distribution of future 
damage state (N+l th test damage 
state as depicted in Fig. 2) as 

f(aJD = {x,a}l,xJ’ 

Figure 2. Gaussian process combines individual distributions at various j e to CO mpute the probability 
instances of time. 
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distribution of the damage state a N+l given a test input x N+l and a set of ‘A’ training points J) — jjf a} N ‘ 

{a (i) }f =l is the i th random variable at i th fatigue cycle. Let us define our damage sate data vector a * to be a Gaussian 
process with a kernel matrix K N . By doing this we have bypassed 17,18 the step of expressing individual priors on the 
noise and the modeling function by combining both priors into the kernel matrix K N . Now the Gaussian distribution 
over a N+ \ can be written as 


IA K ,(x ,x ),x_,&) =|exi - (a ~‘ 3 ~ ) 

V a N+1 

(i) 


Where Z is an appropriate normalizing constant and the mean and variance of the new distribution are, respectively, 
defined as: 


a = k T K a 

N +1 JV + 1 N I< 


= K 


k T K k 


( 2 ) 


In Eq. (2) d is the (\ x J\J ) training output vector that is here the crack length. Whereas, K , k > K are 

N V Y / N+\ N 

the partitioned components of N+l th instances kernel matrix and can respectively described as 

N +1 

K = k(x ,x ) ; k =k(x ,x) ; K =k(x,x) (3) 

V N+\ y N+l / y i V N+\ y i / i=\,2,—,N 7 ij V / ? j / i,j=\,2,—,N 

In Eq. (3) k is the assumed kernel function. There are many possible choices of prior kernel functions. From a 
modeling point of view, the objective is to specify a prior kernel that contains our assumptions about the structure of 
the process being modeled. Formally, we are required to specify a function that will generate a positive definite 
kernel matrix for any set of inputs. A simple non-stationary neural network based 18 kernel function is used for the 
current Gaussian process model and is given by: 


k(x,x)=@Siri( 


(v)C-)(.v) 

(i+(xy 0Xx)y(iu*y©(*)) 


)+0 


(4) 


The parameters 0„ (7=1,2,3) are adjusted to maximize the log likelihood L, given by 

11 N ( 5 ) 

L =—logdetX — a T K'a --log2n 

These hyperparameters are initialized to reasonable values and then, the conjugate gradient method is used to search 
for their optimal values. Initially the kernel function given in Eq. (4) is evaluated using the assumed initial 
hyperparameters and the input space vectors % . The input space vectors jp is a ‘d’ dimensional vector with 

individual elements of the vector contains the value of the different fatigue affecting parameters at the i th instances. Whereas, ‘d’ 
is the number of fatigue affecting physical parameters. 

C. Gaussian Process Based On-line Date Driven Predictive Model 

In the last section we discussed the off-line prediction with a Gaussian process input space generated from loading 
change information and other fatigue parameters. However, in real time it is hardly possible to measure loading or 
compliance change information in a component that is already assembled into full scale flight hardware. This is 
because the load cells are usually large and heavy and in a multiaxial loading environment it is impossible to 
measure the loading change information. However, we can mount small piezoelectric sensors in the full scale 
hardware to measure equivalent loading and associated compliance change information in a real time situation. 
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These sensor signals can be obtained at regular instances and after required processing can be fed to the Gaussian 
process input space to map input space to output space crack length or crack growth rate. Once the training input 
space and training output space formed using Eq. (1) to Eq. (5) we can find the new predicted mean and variance for 
a new test input space. A general architecture for on-line prediction is shown in Fig.3. The figure shows the multiple 
sensor signals that are collected at different instances of a fatigue loading envelop. These signals form the signal 
space for the preprocessing, using a feature extraction algorithm like principal component analysis (PCA) or kernel 
principal component analysis (KPCA). The feature extraction algorithm statistically denoises the original signal and 
generates ranked feature vectors ordered according to its information content. Once the feature vector is found it can 
fed to the Gaussian process input space to map the original sensor signal with the corresponding crack length or 
crack growth rate. The detail of the feature extraction algorithms are discussed below. 









r—I'" -j 



. 4 -- 

? * * * Q 


S Ah ‘4* & (ft 4M 


j-k -M' — — 

“* tft 4* 4^ 4* 

; . * 





i m * * A 




~ 





n ^ ^ « '"i 


S m m* >m 










"+ ii A i i ■ 


"* »**■■*■ 


a ill ill flit ill nJ 


PCA 

OrKPCA 
Signal Space 


M*d 


M*d 


M*d 


Mxd 


Figure 3. General architecture for on-line predictive model 


Principal Component Analysis (PCA) 

Principal component analysis 19,20 is an orthogonal basis transformation that has been widely used for multivariate 
data analysis and dimension reduction. Intuitively, PCA is a process that identifies the direction of the principal 
components where the variance of changes in dynamics is maximum. Assuming ‘M’ different observations and each 
observation with ‘d’ dimensions (as described in Fig. 3) each input signal space p vector is a M x lvector. Then 

the centered d X d covariance matrix of the data set |jJ ^ R M | p — \ 2 d } can ^ ounc ^ as 
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( 6 ) 


Then the covariance matrix is diagonalized to obtain the principal components and the diagonalization can be 
performed by solving the following eigenvalue problem: 

Av = Cv (7) 

The coordinates in the eigenvector basis are called principal components. The size of an eigenvalue A 

p 

corresponding to an eigenvector V of covariance matrix (J equals the amount of variance in the direction of 

P d 

V • Furthermore, the direction of the first ‘n’ eigenvectors corresponding to the biggest ‘n’ eigenvalues covers as 

p 

much variance as possible by 4 n’ orthogonal directions. For the present on-line prediction problem the first 
eigenvector is considered to be the most pristine quantity with maximum buried information and called the principal 
feature vector. This d X 1 feature vector corresponds to the i th instances input vector % of the Gaussian process 

input space. 

Kernel Principal Component Analysis (KPCA) 

One cannot assert that the linear PCA will always detect all structures in a given data set. By the use of suitable 
nonlinear features, one can extract more information. Kernel PCA (KPCA) 21 is well suited to extract non-linear 
structures in the data. Kernel PCA extends the above mentioned PCA approach, and performs principal component 
in high dimensional space. For the purpose the original data given by | e R M \ p = 1 2 ••• d } are ^ irst 

mapped into a high dimensional space F via a (usually nonlinear) mapping 0 and then a linear PCA is performed 
on the mapped data. Then the d X d covariance matrix of the new mapped data set can be found as 


K = (® (x .), ® (x )) = k(x , X ) 


( 8 ) 


There are many form of kernel function. For the present feature extraction problem a radial basis function (RBF) 
based kernel function is used, which has the following form: 


k{x t ,x ) = exp(- 


x - x 


© 


(9) 


Where, the hyperparameter © is assumed constant. Then the kernel matrix is diagonalized to obtain the principal 
components and the diagonalization can be performed by solving the following eigenvalue problem: 

Av = Kv (10) 

Where A and V are respectively the eigenvalue and eigenvector. Also in a similar way as described for PCA 
based feature extraction technique, the first feature vector from every Kernel PCA analysis can be used to feed the 
Gaussian process input space. 
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III. Numerical Verification 


Numerical studies have been conducted to verify the effectiveness of the developed prediction model by using 
experimental data obtained from fatigue tests conducted in-house. The details of the experiment and numerical 
studies are discussed below. 

A. Fatigue Test Experiment 

The fatigue experiments were performed using an Instron 1331 servo-hydraulic load frame operating at 20 Hz. 
Eighteen A1 2024 T351 Compact Tension (CT) samples, each 6.31 mm thick, were used. These specimens were 
fabricated according to ASTM E647-93 
with an average width of 25.53 mm 
(from the center of the pin hole to the 
edge of the specimen) and an average 
height of 30.6 mm. To simulate typical 
flight maneuvering conditions a 
variable load spectrum was 
programmed into the digital controller 
of the load frame. The spectrum, as 
coded to the load-cell, is shown in Fig. 

4. However, due to noise and 
compliance effects the load that was 
actually applied to structure was 
somewhat different than that 
programmed to the controller. The 
minimum and maximum loads applied 
to the structure were measured through 
the load cell and are used to construct 
the Gaussian process input space. It 
corresponding load cell outputs were recorded at chosen instances during the experiment. Each time the test was 
stopped, a high resolution picture of the cracked sample was taken using a digital camera to generate the output 

space crack length data. The corresponding 
measured crack lengths for different samples are 
depicted in Fig. 5. It is noted that CT samples are 
named from ct416a to ct436a. The prefix ‘cf 
symbolizes compact tension, the middle number 
shows the number of the sample, and the suffix 
‘a’ indicates that the initial notch is made along 
the rolling direction of the aluminum plate. The 
absence of some numbers, such as ct422a (Fig. 
5) imply that the data could not be collected for 
those samples due to premature failure of the 
specimen. Out of a total of 18 samples for which 
the fatigue crack length data were available, four 
samples: ct417a, ct419a, ct421a and ct423a were 
instrumented with a piezoelectric sensor and an 
actuator. A typical instrumented sample after the 
Figure 5. Experimental crack length data fatigue test is shown in Fig. 6. When the test 

frame was stopped the piezoelectric actuator was 
excited with a narrow band burst signal. The corresponding sensor signal was collected using a National Instrument 
(NI) data acquisition system. These collected signals are used as input to the on-line predictive model, which will be 
discussed in detail later. The CT specimens were not removed from the test frame and were kept under tensile static 
loading condition during this sensor data collection process to simulate realistic on-board conditions. At each 
stopping instant, 150 sensor observations were made, out of which 100 were selected based on a Fast Fourier 
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Figure 4. Loading spectrum programmed to load frame controller 

must be noted that for each sample the frame was stopped and the 
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Transform (FFT) filter. These 100 observations were used as inputs to PCA or KPCA algorithms for feature 
extraction at those instances. 

B. Gaussian Process Off-line Prediction 


The Gaussian process input spaces are constructed using the 
load cell readings at different instances and the fatigue cycles at 
those instances. The individual is a Jxl vector with 

elements comprising fatigue cycles, minimum load, maximum 
load and load ratio. The individual jq form the d X N input 

training space and the corresponding lx N observed crack 
lengths (from high resolution images) form the training output 
space. The dimension of the input space can be varied to any 
number, but for the present study it is restricted to one for single 
variate predictions and four for multivariate predictions. For a 
new test instance with known input the corresponding mean 

output crack length is predicted using Eq. (2). Before using the data for GP prediction the input and output space 
variables are logarithmically scaled. In addition to following a zero mean Gaussian process as described in Eq. 1, the 
output crack lengths are scaled as zero mean random data. Both the single and multivariate predictions of output 



Figure 6. Instrumented CT snecimen 
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Figure 7A. Single variate prediction 
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Figure 7B. Multi variate prediction 


crack lengths are made for all 18 CT specimen (ref. Fig. 5). It is noted that while selecting a particular sample as a 
test specimen, the data from rest of the seventeen samples are used for training the Gaussian process model. 
Figures 7A and 7B show the single variate and multivariate GP predictions of crack length, respectively. Comparing 
Figs. 7A and 5, it can be seen that even though the single variate GP with time (number of cycles) as only input 
variable is able to predict an average crack growth curve, it is unable to track the transient jump in crack growth that 
arises due to transient over load cycles (Fig.4). The only exception is sample ct419a where a good prediction is 
observed (ref. Fig. 7A). However, comparing Figs. 7B and 5 it can be seen that the multivariate GP captures those 
transient loading effects for most of the samples. To further improve the GP off-line prediction, a rate-based 
prediction has been also performed. In this case, rather than predicting directly the crack length, the crack growth 
rate is predicted first, and then it is integrated using time as the only input variable, and the corresponding crack 
length is estimated. From a numerical perspective, it is better to predict a derivative than to predict an integral (here, 
the crack length) for a highly non-smooth function. Also from a physical perspective, fatigue crack growth is 
normally expressed in terms of a first order nonlinear rate equation. Therefore, although the direct crack length 
based GP model is capable of capturing the nonlinearity, the rate-based prediction allows capturing first order crack 
growth rate, and helps capture the physics of a dynamic system. However, it must be noted that to perform a rate- 
based crack growth estimation, the cycle by cycle rate has to be integrated for the most correct crack growth length 
prediction. In the present case, where experimental data were available only at discrete instances, the prediction of 
crack growth rate is possible at only those instances for which the input space variables are available from 
experimental observation. Nevertheless, to estimate a continuous crack length curve in the cycle by cycle integration 
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process when the crack growth rate is not available at a particular cycle, the crack growth rate for a previous cycle is 
assumed for the current cycle. A comparison of GP prediction of crack growth rate and the crack growth rate 
observed from experiments can be seen in Figs. 8A and 8B. The figures show that with the exception of a slightly 
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Figure 8A. Predicted rate from multivariate GP 
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Figure 9A. Continuous crack length from 
integration of GP estimated rate 
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Figure 8B. Rate from experimental observation 
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Figure 9B. Discrete crack lengths (at experimental 
stopping instances) captured from Fig.9A 


lower rate prediction compared to the experimental rate, there is a good correlation between predicted and observed 
rates in the context of transient overloading. The slight under prediction in rate, compared to experimental values is 
attributed to the lack of continuous experimental data, which led to the use of an averaging technique for rate 
estimation over large numbers of cycles. Figure 9A shows the continuous crack length estimated for different 
samples and Fig. 9B shows the crack lengths at discrete instances (where experimental data were collected) captured 
from Fig.9A. Comparing Figs. 9A and 5, it is found that the prediction accuracy improves over direct crack length 
based prediction as shown in Fig. 7B. 

C. Gaussian Process On-line Prediction 

For the on-line Gaussian process prediction the input vector jp in the Eq. (4) is fed with sensor signal features 

found using either Principal Component Analysis or Kernel Principal Component Analysis. At each observation 
instant, N (Eq. 1-5), the piezoelectric actuator as shown in Fig. 6 is actuated with a burst signal. The burst signal as 
shown in Fig. 10 has a central frequency of 135 KHz and a sampling frequency of 2Ms/sec. For each actuation, 150 
sensor observations are obtained for probabilistic feature extraction. 
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Out of the 150 sensor observations only 100 are used for the feature extraction. These 100 observations are 
selected based on a Fast Fourier Transform filter that allows selecting a particular sensor signal with a central 
frequency in a range of 140+ 30 KHz. The selection of 30 KHz upper and lower limit is based on the assumption 
that the maximum frequency variation of the observed signal will not cross these limits over the fatigue loading 
envelope. This ensures that low frequency noise due to 
the hydraulic pump of the fatigue frame and high 
frequency noise due to other environmental factors are 
not modeled in the feature extraction process. Once 
the first 100 sensor signals of 1000 samples each, are 
selected, those signals are used as input in the feature 
extraction algorithm. With this information the value 
of ‘M’ and ‘d’ in expression 

{y p eiT \p = l,2,-d} becomes M=10 ° and 
d=1000. The reason for selecting 100 similar 
observations for feature extraction is to statistically 
select the best features from the original signal, which 
may have environmental noise. Using the above 
mentioned signals, the covariance matrix (Eq. 6) for 

PCA and the kernel matrix (Eq. 8) for KPCA are Figure 10. Burst signal applied through PZT actuator 
evaluated. The covariance matrix for PCA and the 

Kernel matrix for KPCA at a typical observation instant are presented in Figs. 11 and 12, respectively. 
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Figure 11. Hierarchical feature extraction and crack growth prediction using PCA and GP 
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Figure 12. Hierarchical feature extraction and crack growth prediction using KPCA and GP 


The covariance matrix is used to solve the eigenvalue problem described in Eq.(7) for PC A based feature 
extraction, and the kernel matrix is used to solve the eigenvalue problem described in Eq.(9) for KPCA based 
feature extraction. From the eigenvalue analysis the first eigenvector is selected as the feature vector. It must be 
noted that the feature extraction is done at each discrete instant where the fatigue frame was stopped to collect data. 
This leads to ‘N’ (Eq. 1-5) number of feature vectors, each of length d =1000. Once the feature vectors are obtained 
for different instances, the Gaussian process input and output space are formed. The input training space of size 
dxN is formed from the ‘N’ feature vectors. The corresponding training output space consists of 6 N’ observed 
crack length, which are used in the off-line prediction. Once the Gaussian process training input and output spaces 
are formed the prediction of an unseen output state, here the crack length or crack growth rate, is made using Eq. 2- 
5. Unlike the log scaling of both input and output spaces used for the off-line prediction, for the on-line prediction 
the input space is not scaled, but the output space is scaled with zero mean, unit variance scaling. This type of 
scaling is performed to ensure that both the input and the output spaces have similar variances, though not 
necessarily the same. It is noted that the Gaussian process works well when the distribution of underlying variables 
have similar mean and variances. Typical hierarchical results found at different stages of the on-line prediction 
process are shown in Figs. 11 and 12. The results for direct crack length predicted using PCA based feature 
extraction technique is shown in Fig. 11, and results with the KPCA based feature extraction technique is shown in 
Fig. 12. The figures indicate that the match between experiment and prediction is not as good for all the fatigue 
loading envelopes as seen for the off-line prediction, but there is the same qualitative trend between experiment and 
prediction, particularly during the transient fatigue loading phase. This is because during the transient load regime 
larger numbers of data points are available (Fig.5) compared to the lower load regime. During the lower load regime 
the crack growth is assumed stable and hence fewer data points are collected over this regime. In addition, the crack 
growth rate during lower load regime is approximately one and half order of magnitude less compared to the 
average crack growth rate during the transient high load regime. For example for the ct423a sample, the average 
crack growth rate during the lower load regime is approximately 1.041 xlO" 5 mm/cycles, whereas the approximate 
average rate during the third high load regime (87e3 to 88e3 cycles) is 1.4 xlO" 4 mm/cycles. This possibly leads to a 
Gaussian process scaling mismatch and subsequent erroneous prediction in the lower load regime. The scaling 
issues will be addressed during our future work. Crack growth rates are also predicted using the Gaussian process 
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based on-line predictive model. The crack growth rates are predicted using both PCA and KPCA based features and 
are shown in Figs. 13A and 13B, respectively. The figures indicate that unlike the case of direct crack growth 
prediction, in the crack growth rate prediction there is better correlation between experimental and predicted rates. 
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Figure 13A. Rate prediction based on PCA features 



Figure 13B. Rate prediction based on KPCA features 


Also it is found that both the feature extraction algorithms produce similar results on crack growth rate prediction, 
except in the case of the ct423a specimen. For this specimen, the GP prediction of the rate correlates better with 
experiment compared to PCA based GP prediction. The KPCA based GP prediction could possibly be improved by 
using a non stationary kernel (Eq. 4) as opposed to the stationary kernel (Eq.9) used for the present feature 
extraction problem. Once the crack growth rates are predicted using GP, the continuous crack length can be 
estimated via cycle by cycle integration of the predicted rates. In the absence of rate information at any particular 
cycle, a strategy similar to that for the off-line prediction is employed to estimate the corresponding crack length. 
However, it is noted that for the online prediction case, while estimating the crack length in the lower load regime 
the integration algorithm is modified to select the minimum between the GP predicted rate and a rate of value 1.041 
xlO" 5 mm/cycles. This value is found from experimental observations in the lower load regime of CT423a samples, 
but is of similar value for other samples. The purpose of using this value in the integration process is to avoid using 
a spurious rate as predicted from the GP particularly in a lower load regime. The continuous crack lengths as 
integrated from the predicted rate are shown Figs. 14A and 14B, respectively, for PCA and KPCA based predictions. 








Figure 14A. Continuous crack growth estimation 
based on PCA features 


Figure 14B. Continuous crack growth estimation 
based on KPCA features 
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Figures 11,12 and 14 show that for both the PC A and the KPCA based feature extraction techniques, the rate-based 
prediction of crack length growth has better correlation with experiments compared to the direct GP prediction of 
crack growth. 


IV. Concluding Remarks 

A hybrid prognosis framework has been developed by integrating off-line data driven and on-line system 
identification based predictive models. The procedure has been used to investigate the fatigue crack behavior of A1 
2024 compact-tension (CT) specimens under variable loading, modeled using a multivariate Gaussian process 
approach within a Bayesian framework. The input space of the model is trained with parameters such as number of 
fatigue cycles, minimum load, maximum load, and load ratio for the off-line prediction. The input space is trained 
using features obtained from piezoelectric sensor signals for the online prediction. In both off-line and on-line 
cases, the output space is trained with known associated crack lengths. Predictions are conducted using the trained 
Gaussian process model and the results are validated with experiments. Some important observations from this 
study are as follows: 

1. The numerical results indicate that the multivariate off-line prediction model outperforms the single 
variate prediction model. 

2. The rate based Gaussian process crack growth prediction better captures the physics of the dynamic 
system, under variable loading condition. 

3. The on-line architecture developed is used to evaluate the performance of PCA and based feature 
extraction algorithms. It is found that in some cases the PCA based prediction algorithm gives better 
predictions, whereas in others the KPCA based prediction performs better. The consistency in kernel 
based algorithm for better prediction will be investigated in future work. 

4. The on-line prediction algorithms are used to validate a realistic situation, where the component under 
investigation is an integral part of a larger structural assembly (in the present case the fatigue frame). 

5. In the present framework, using the Bayesian based probabilistic approach the uncertainty due to 
loading, and micro structural parameters is explicitly modeled in the prediction algorithm. In the future 
the off-line Gaussian process model will be combined with a physics-based model to incorporate 
uncertainty arising due to modeling error, micro structural variability, loading uncertainty and 
manufacturing tolerance. 
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